Skip to main content
The EDL Pipeline follows a strict 6-phase architecture designed to orchestrate 16+ data fetching and processing scripts in the correct dependency order. This page explains why order matters, how phases depend on each other, and the configuration options available.

Architecture Overview

The pipeline is orchestrated by run_full_pipeline.py, which executes scripts sequentially across six phases:

Phase Breakdown

PHASE 1: Core Data (Foundation)

Creates the foundational datasets that all other scripts depend on.
ScriptOutputPurpose
fetch_dhan_data.pydhan_data_response.json
master_isin_map.json
Fetches 2,775 stocks and creates ISIN mapping
fetch_fundamental_data.pyfundamental_data.jsonQuarterly results & financial ratios (35 MB)
NSE CSV Downloadnse_equity_list.csvListing dates for all stocks
Critical Dependency: master_isin_map.json is used by ALL scripts in Phase 2, 2.5, and 4. If fetch_dhan_data.py fails, the pipeline cannot continue.

PHASE 2: Data Enrichment (Fetching)

Parallel execution of 11 data fetching scripts, all consuming master_isin_map.json.
ScriptOutputDescription
fetch_company_filings.pycompany_filings/{SYMBOL}_filings.jsonHybrid LODR + Legacy filings
fetch_new_announcements.pyall_company_announcements.jsonLive corporate announcements
fetch_advanced_indicators.pyadvanced_indicator_data.jsonPivot Points, EMA/SMA signals (8.3 MB)
fetch_market_news.pymarket_news/{SYMBOL}_news.jsonAI-sentiment news (50/stock)
fetch_corporate_actions.pyupcoming_corporate_actions.json
history_corporate_actions.json
Dividends, Bonus, Splits (2 years history + 2 months ahead)
fetch_surveillance_lists.pynse_asm_list.json
nse_gsm_list.json
ASM/GSM surveillance lists
fetch_circuit_stocks.pyupper_circuit_stocks.json
lower_circuit_stocks.json
Circuit breaker stocks
fetch_bulk_block_deals.pybulk_block_deals.jsonBulk/Block deals (30 days)
fetch_incremental_price_bands.pyincremental_price_bands.jsonDaily price band changes
fetch_complete_price_bands.pycomplete_price_bands.jsonAll securities price bands
fetch_all_indices.pyall_indices_list.json194 market indices

PHASE 2.5: OHLCV Data (Smart Incremental)

Optional phase controlled by FETCH_OHLCV flag. Downloads lifetime historical OHLCV data with intelligent incremental updates.
ScriptOutputPerformance
fetch_all_ohlcv.pyohlcv_data/{SYMBOL}.csv~2-5 min incremental, ~30 min first-time
fetch_indices_ohlcv.pyohlcv_data/indices/{INDEX}.csvHigh-speed specialized fetcher
Incremental Logic: Only downloads missing dates if CSV exists, full history otherwise.
If FETCH_OHLCV = False, the following fields will be zero in the final output:
  • ADR (Average Daily Range)
  • RVOL (Relative Volume)
  • ATH & % from ATH
  • All turnover metrics
  • Post-earnings returns

PHASE 3: Base Analysis (Building Master JSON)

Single critical script that produces the base structure of all_stocks_fundamental_analysis.json.
Inputs:
  • fundamental_data.json (Phase 1)
  • dhan_data_response.json (Phase 1)
  • advanced_indicator_data.json (Phase 2)
  • nse_equity_list.csv (Phase 1)
Outputs:
  • all_stocks_fundamental_analysis.json (Base structure with ~60 fields)
Processing:
  1. Loads fundamental data for all 2,775 stocks
  2. Merges technical data from Dhan response
  3. Adds advanced indicators (Pivots, SMA/EMA status)
  4. Calculates QoQ/YoY growth metrics
  5. Computes valuation ratios (P/E, PEG, ROE, ROCE, D/E)
  6. Adds shareholding patterns (FII/DII changes, Free Float)
This script MUST complete successfully before Phase 4. All Phase 4 scripts modify this JSON file in-place.

PHASE 4: Enrichment Injection (Order Matters!)

Five scripts that sequentially inject additional fields into all_stocks_fundamental_analysis.json.
CRITICAL: These scripts MUST run in this exact order. Each modifies the JSON file in-place.
OrderScriptFields AddedDependencies
1advanced_metrics_processor.pyADR, RVOL, ATH, Turnover, Gap Up %, Day Range %ohlcv_data/
2process_earnings_performance.pyQuarterly Results Date, Returns since Earnings, Max Returns since Earningscompany_filings/, ohlcv_data/
3enrich_fno_data.pyF&O Flag, Lot Size, Next ExpiryF&O data fetchers
4process_market_breadth.pyRelative Strength Rating, Market Breadth metricsReturns data from base analysis
5process_historical_market_breadth.pyHistorical breadth chartsOHLCV data
6add_corporate_events.pyEvent Markers, Recent Announcements, News FeedALL Phase 2 outputs

PHASE 5: Compression

Compresses final outputs to .json.gz format with maximum compression.
Files Compressed:
  • all_stocks_fundamental_analysis.json.json.gz (~80% smaller)
  • sector_analytics.json.json.gz
  • market_breadth.csv.json.gz
Typical Results:
  • Raw JSON: ~35-40 MB
  • Compressed: ~7-8 MB
  • Compression ratio: 80%+

PHASE 6: Optional Standalone Data

Controlled by FETCH_OPTIONAL flag. Produces standalone datasets not included in the master JSON.
ScriptOutputDescription
fetch_all_indices.pyall_indices_list.json194 market indices
fetch_etf_data.pyetf_data_response.json361 ETF details
Note: These are standalone products and not consumed by the master pipeline.

Configuration Flags

Edit these flags inside run_full_pipeline.py (lines 60-71):
# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

Impact of Configuration

SettingImpactRuntimeOutput Fields
TrueFull OHLCV download + incremental updates+2-30 minAll 86 fields populated
FalseSkip OHLCV entirelyFaster (~4 min total)15+ fields will be zero
Zero Fields when False:
  • ADR (5/14/20/30 Days MA)
  • RVOL
  • ATH, % from ATH
  • Gap Up %, Day Range %
  • % from 52W Low
  • 6 Month Returns
  • 200 Days EMA Volume
  • Daily Rupee Turnover (20/50/100)
  • 30 Days Average Rupee Volume
  • Returns since Earnings
  • Max Returns since Earnings

Error Handling Strategy

The pipeline implements a resilient continuation strategy:
1

Critical Failures

If fetch_dhan_data.py (Phase 1) or bulk_market_analyzer.py (Phase 3) fail, the pipeline stops immediately.These scripts produce the master ISIN map and base JSON that all other scripts depend on.
2

Enrichment Failures

If any Phase 2 or Phase 4 script fails, the pipeline continues and marks the script as failed.This ensures you get a complete output even if individual data sources are temporarily unavailable.
3

Final Report

At completion, the pipeline reports:
  • Total runtime
  • Successful scripts count
  • Failed scripts list
  • Output file size and compression ratio

Performance Characteristics

Minimal Run

Configuration: FETCH_OHLCV = FalseRuntime: ~4 minutesOutput: 60+ fields per stock (missing volume/volatility metrics)

Full Run

Configuration: FETCH_OHLCV = True (incremental)Runtime: ~6-9 minutesOutput: All 86 fields per stock

First-Time Full Run

Configuration: FETCH_OHLCV = True (no existing data)Runtime: ~35-40 minutesOutput: All 86 fields + complete OHLCV history

With Cleanup

Configuration: CLEANUP_INTERMEDIATE = TrueDisk Saved: ~150-200 MBRetained: Only .json.gz + ohlcv_data/

Next Steps

Data Flow

Understand how data transforms across phases

Output Schema

Explore the 86 fields in the final JSON

Quick Start

Run your first pipeline

Configuration

Customize pipeline behavior